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Test validity refers to the degree with which the inferences based on test scores are 
meaningful, useful, and appropriate. Thus test validity is a characteristic of a test when it 
is administered to a particular population. Validating a test refers to accumulating 
empirical data and logical arguments to show that the inferences are indeed 
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appropriate. 

This article introduces the modern concepts of validity advanced by the late Samuel 
Messick (1989, 1996a, 1996b).We start with a brief review of the traditional methods of 
gathering validity evidence. 

TRADITIONAL CONCEPT OF VALIDITY 



Traditionally, the various means of accumulating validity evidence have been grouped 
into three categories -- content-related, criterion-related, and construct-related evidence 
of validity. These broad categories are a convenient way to organize and discuss 
validity evidence.There are no rigorous distinctions between them; they are not distinct 
types of validity. Evidence normally identified with the criterion-related or content-related 
categories, for example, may also be relevant in the construct-related evidence 



* Criterion-related validity evidence - seeks to demonstrate that test scores are 
systematically related to one or more outcome criteria. In terms of an achievement test, 
for example, criterion-related validity may refer to the extent to which a test can be used 
to draw inferences regarding achievement. Empirical evidence in support of 
criterion-related validity may include a comparison of performance on the test against 
performance on outside criteria such as grades, class rank, other tests and teacher 
ratings. 



* Content-related validity evidence - refers to the extent to which the test questions 
represent the skills in the specified subject area. Content validity is often evaluated by 
examining the plan and procedures used in test construction. Did the test development 
procedure follow a rational approach that ensures appropriate content? Did the process 
ensure that the collection of items would represent appropriate skills? 



* Construct-related validity evidence - refers to the extent to which the test measures 
the "right" psychological constructs. Intelligence, self-esteem and creativity are 
examples of such psychological traits. Evidence in support of construct-related validity 
can take many forms. One approach is to demonstrate that the items within a measure 
are inter-related and therefore measure a single construct. Inter-item correlation and 
factor analysis are often used to demonstrate relationships among the items. Another 
approach is to demonstrate that the test behaves as one would expect a measure of the 
construct to behave. For example, one might expect a measure of creativity to show a 
greater correlation with a measure of artistic ability than with a measure of scholastic 
achievement. 
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MODERN CONCEPT OE VALIDITY 



Messick (1989, 1996a, 1996b) argues that the traditional conception of validity is 
fragmented and incomplete especially because it fails to take into account both 
evidence of the value implications of score meaning as a basis for action and the social 
consequences of score use. His modern approach views validity as a unified concept 
which places a heavier emphasis on how a test is used. Six distinguishable aspects of 
validity are highlighted as a means of addressing central issues implicit in the notion of 
validity as a unified concept. In effect, these six aspects conjointly function as general 
validity criteria or standards for all educational and psychological measurement. These 
six aspects must be viewed as interdependent and complementary forms of validity 
evidence and not viewed as separate and substitutable validity types. From Messick 
(1996b), 

* Content A key issue for the content aspect of validity is determining the knowledge, 
skills, and other attributes to be revealed by the assessment tasks. Content standards 
themselves should be relevant and representative of the construct domain. Increasing 
achievement levels or performance standards should reflect increases in complexity of 
the construct under scrutiny and not increasing sources of construct-irrelevant difficulty 
(Messick, 1996a). 

* Substansive The substansive aspect of validity emphasizes the verification of the 
domain processes to be revealed in assessment tasks. These can be identified through 
the use of substansive theories and process modeling (Embretson, 1983; Messick 
1989). When determining the substansiveness of test, one should consider two points. 
First, the assessment tasks must have the ability to provide an appropriate sampling of 
domain processes in addition to traditional coverage of domain content. Also, the 
engagement of these sampled in these assessment tasks must be confirmed by the 
accumulation of empirical evidence. 

* Structure Scoring models should be rationally consistent with what is known about the 
structural relations inherent in behavioral manifestations of the construct in question 
(Loevinger, 1957). The manner in which the execution of tasks are assessed and 
scored should be based on how the implicit processes of the respondent's actions 
combine dynamically to produce effects. Thus, the internal structure of the assessment 
should be consistent with what is known about the internal structure of the construct 
domain (Messick, 1989). 

* Generalizability Assessments should provide representative coverage of the content 
and processes of the construct domain. This allows score interpretations to be broadly 
generalizable within the specified construct. Evidence of such generalizability depends 
on the tasks' degree of correlation with other tasks that also represent the construct or 
aspects of the construct. 

* External Factors The external aspects of validity refers to the extent that the 
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assessment scores' relationship with other measures and nonassessment behaviors 
reflect the expected high, low, and interactive relations implicit in the specified construct. 
Thus, the score interpretation is substantiated externally by appraising the degree to 
which empirical relationships are consistent with that meaning. 

* Consequential Aspects of Validity It is important to accrue evidence of such positive 
consequences as well as evidence that adverse consequences are minimal. The 
consequential aspect of validity includes evidence and rationales for evaluating the 
intended and unintended consequences of score interpretation and use. This type of 
investigation is especially important when it concerns adverse consequences for 
individuals and groups that are associated with bias in scoring and interpretation. 

These six aspects of validity apply to all educational and psychological measurement; 
most score-based interpretations and action inferences either invoke these properties or 
assume them, explicitly or tacitly. The challenge in test validation, then, is to link these 
inferences to convergent evidence which support them as well as to discriminant 
evidence that discount plausible rival inferences. 

SOURCES OF INVALIDITY 



Two major threats to test validity are worth noting, especially with today's emphasis on 
high-stakes performance tests. 

"Construct underrepresentation" indicates that the tasks which are measured in the 
assessment fail to include important dimensions or facets of the construct. Therefore, 
the test results are unlikely to reveal a student's true abilities within the construct which 
was indicated as having been measured by the test. 

"Construct-irrelevant variance" means that the test measures too many variables, many 
of which are irrelevant to the interpreted construct. This type of invalidity can take two 
forms, "construct-irrelevant easiness" and "construct-irrelevant difficulty." 
"Construct-irrelevant easiness" occurs when extraneous clues in item or task formats 
permit some individuals to respond correctly or appropriately in ways that are irrelevant 
to the construct being assessed; "construct-irrelevant difficulty" occurs when extraneous 
aspects of the task make the task irrelevantly difficult for some individuals or groups. 
While the first type of construct irrelevant variance causes one to score higher than one 
would under normal circumstances, the latter causes a notably lower score. 

Because there is a relative dependence of task responses on the processes, strategies, 
and knowledge that are implicated in task performance, one should be able to identify 
through cognitive-process analysis the theoretical mechanisms underlying task 
performance (Embretson, 1983). 
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